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ABSTRACT 

This study compared the effectiveness of several 
existing and proposed methods for statistically adjusting college 
grade point averages (GPAs) for course and departmental differences 
in grading standards, using first-semester grades from an entire 
entering class at a large state university (4,351 students), in 1988. 
Most of the adjusted GPAs produced by these methods functioned 
similarly and, despite high correlations with actual GPA, had greater 
internal-consistency reliability than actual GPA and were more 
predictable from Scholastic Aptitude Test (SAT) scores and high 
school rank (HSR). Most of the adjusted GPAs also functioned 
similarly with regard to sex differences in over-underprediction . The 
adjusted GPAs and actual GPAs exhibited the same small but 
significant sex differences in over-underprediction by SAT scores, 
but the adjusted GPAs displayed smaller differences than actual GPAs 
in over-underprediction by SAT scores and HSR. Seven tables present 
analysis results. (Contains 44 references.) (Author/SLD) 
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Abstract 

This study compared the effeo.tivei.ess of several existing and proposed methods 
for statistically adjusting college CPAs for course and departmental 
differences in grading standards, using first-semester grades from an entire 
entering class at a large state university. -Most of the adjusted CPAs 
produced by these methods functioned similarly and. despite high correlations 
with actua GPA. had greater internal-consistency reliability than actual GPA 
and were more predictable from SAT scores and high school rank (HSR) . Most of 
the adjusted CPAs also functioned similarly with regard to sex differences in 
over-underprediction. The adjusted CPAs and actual GPA exhibited the same 
small but significant sex differences in over-underprediction by SAT scores, 
but the adjusted CPAs displayed smaller differences than actual GPA in over- 
underprediction by SAT scores and HSR. 



Adjusting College Grade-Point Average for Variations in Grading Standards 
College grade-point average (GPA) , though originally intended for 
administrative purposes (Smaliwood, 1935), is widely employed in educational 
and psychological research, particularly as a criterion for validating 
admissions measures (e.g., see the reviews by Breland, . 1981;. Fishman & 
Pasanella, 1960; Lavin, 1965). 

Despite the popularity of GPA, it is generally recognized that this is a 
fallible index of academic performance (e.g., see the reviews by Milton. 
Pollio, & Eison, 1986; Warren, 1971; Willingham, 1990). A major problem is 
that GPA is based on a different set of courses for each student, and the 
grading standards are not uniform from course to course, a phenomenon that has 
been observed for many years (e.g., Meyer, 1908). Hence, GPA is not 
comparable for students who take courses with severe grading standards and 
students who take courses with lenient standards, and its reliability and 
validity are attenuated. 

Differences in grading standards have been rigorously documented among 
departments (Anderhalter , 1962; de Nevers, 1984; Elliott & Strenta, 1988; 
Frisbee, 1984: Gamson, 1967; Goldman & Hewitt, 1975; Goldman, Schmidt, Hewitt, 
& Fisher, 1974; Goldman & Widawski, 1976; Juola, 1968; Prather & Smith, 1976; 
Prather, Smith, &Kodras, 1979; Ramist, Lewis ,& McCamley, 1990; Sabot & 
Wakeman-Linn, 1991; Strenta & Elliott, 1987; Willingham, 1985), as well as 
within departments (Garrison, 1979; Juola, 1968). 

The consequences of variations in grading standards on the reliability 
and validity of GPA are suggested by studies that attempted to adjust GPA for 
differences in these standards. The adjustments increased the median 
correlation between yearly GPAs from .67 to .72 (Elliott & Strenta, 1988). 
The adjustments also generally boosted the correlations of admissions measures 
with GPA: the multiple correlation of the Scholastic Aptitude Test (SAT; 
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Donlon, 1984) scores and high school GPA with four-year GPA increased from .58 
to .64 (Young, 1990b), and the correlations of the total SAT score (combining 
the Verbal [V] and Mathematical [M] scores) with four-year GPA went from .43 
to .50 (Strenta 6e Elliott, 1987), but the multiple correlation of SAT scores 
and HSR with first-semester GPA increased from only .42 to .44 (Strieker, 
Rock, & Burton, 1991). 

Several statistical methods that directly or indirectly adjust GPAs for 
differences in grading standards have been developed in recent years. Goldman 
and Widawski (1976) devised a within- subject'^ procedure that compared average 
course grades earned by students who took courses in different pairs of 
departments and then adjusted grades for the difference in these averages. 
(This procedure was subsequently used by Strenta. and Elliott, 1987.) 

Elliott and Strenta (1988) extended the Goldman and Widawski (1976) 
procedure, not only comparing corresponding average course grades earned by 
students who took courses in different pairs of departments but also comparing 
corresponding average course grades for students who took different courses in 
the same departments and adjusting grades for the differences in both 
averages . 

Young (1990a, 1990b) applied item response theory (IRT) methods to course 
grades, treating the grades like polytomously scored item responses (Muraki, 
1990), to secure estimated "thetas" (scores on the latent trait underlying the 
grades) for three fields (h»jimanities , social sciences, and natural sciences 
and engineering) and then combined the three estimated thetas into a composite 
measure . 

Strieker et al. (1991) employed a regression procedure to residualize 
average course grades for the characteristics of the students enrolled in the 
courses (high school honors courses taken in various fields, intended college 
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majors, percentage of college-bound seniors in their high schools) and 
adjusted grades for the residual. 

Other procedures are also applicable to this problem. The discrepancy 
between the average grade in a course and the average predicted overall GPA 
(predicted from admissions test scores and high school record) for students in 
the course could be used to adjust course grades. Such an index (with SAT 
scores and high school GPA as predictors), the "grade-residual mean," was used 
recently by Ramist et al. (1990) to assess the leniency or severity of the 
grading standards for courses. Variants of this index were employed for the 
same general purpose in previous studies. Anderhalter (1962) used the 
discrepancy between the average grade for a department and average predicted 
overall GPA (based on admissions test scores and HSR) to evaluate departments' 
grading standards. And Juola (1968) employed the difference between the 
average grade for a course and average actual overall GPA (in other courses) 
to assess courses' grading standards. 

The unavailable grades for courses that students do not take could be 
treated as a missing data problem (Little & Rubin, 1987), with the missing 
grades considered as "missing at random" in the sense that they are 
predictable from available grades. The missing grades can be imputed by 
maximum likelihood methods, using the EM algorithm (Dempster, Laird, & Rubin, 
1977), generating a complete set of grades for all students. 

The Strieker et al. (1991) regression method could be modified to 
eliminate student characteristics that are specific to disciplines and hence 
may undercorrect for departmental differences in grading standards. 

These existing and proposed methods for adjusting GPA differ in whether 
they rely on "internal" data (other grades) or "external" data (other, non- 
grade variables); the methods also vary in their complexity and 
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sophistication. But nothing is known about the methods' relative 
effectiveness in improving the psychometric properties of GPA, Accordingly, 
the^main aim of the present study was to compare these methods with regard to 
their intercorrelations, reliability, and correlations with admissions 
measures. A secondary purpose was to assess the effects of these methods on 
sex differences in over-underprediction, for several of the methods have been 
applied to this problem (Elliott & Strenta, 1988; Strieker et al., 1991; 
Young, 1991). A final goal was to explore the efficacy of a novel approach, 
suggested by Ramist et al. (1990), for predicting GPA from the cumulated 
predictions of individual course grades. 

Method 

Sample 

The sample consisted of 4,351 students (2,318 women and 2,033 men) in the 
Fall 1988 entering class at a large state university's main campus. The 
sample was limited to full-time freshmen enrolled in the seven undergraduate 
schools: three liberal arts colleges and four professional schools. This is 
the same sample used in the Strieker et al. (1991) study. 
Actual and Ad") us ted CPAs 

First-semester grades in all degree-credit courses were used. (No Credit 
grades assigned to students in one of the liberal arts colleges in lieu of Fs 
were treated as Fs, and temporary grades were treated the same as permanent 
ones.) The cohort enrolled in 498 courses in 86 departments. These grades 
were also used in the Strieker et al. (1991) study. 

Whenever possible, adjusted CPAs were based on the 140 individual and 
pooled courses used with the original Strieker et al . (1991) procedure, 
described subsequently, which provides a means of adjusting grades for all 
courses, regardless of their size. (The IRT GPA. as used originally by Young, 
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1990a, 1990b, was restricted to courses with a minimum size.) The exceptions 
were the within- subjects GPA, which had provisions for dealing with the size 
problem, and the imputed GPA, which was computationally impractical to apply 
to the 140 individual and pooled courses. The actual and adjusted CPAs 
derived from the grades were based on all grades and weighted by the number of 
credit hours per course, unless otherwise noted. 

Actual GPA . This GPA is based on actual, unadjusted grades; it is the 
same variable used in the Strieker et al. (1991) study. 

Within- subjects GPA . The Elliott and Strenta (1988) procedure was 
followed with the 498 individual courses. Between- department adjustments were 
made for 53 individual departments and a pooled department that combined 
departments with fewer than ten grades. (Between-department discrepancies 
were weighted by the number of students involved.) Within- department 
adjustments were made in 12 departments for 33 individual courses and a pooled 
course that combined courses in the same department that had fewer than ten 
grades . 

IRT GPA . The Young (1990a, 1990b) procedure was followed with the same 
140 individual and pooled courses used in the original Strieker et al. (1991) 
method. These are 119 individual courses with available data for ten or more 
students and 21 pooled courses that combine individual courses to achieve 
sample sizes of ten or more students: 20 pooled courses made up of individual 
courses combined by department and one pooled course cor^rised of individual 
courses combined across departments. Courses were categorized as humanities, 
social sciences, natural sciences » or other, using an adaptation of the 
university's department classification employed in the Strieker et al. (1991) 
study. The IRT analyses, employing the PARSCALE program (Muraki & Bock, 
1991), were done separately for the major categories of courses: 55 
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hiimanities courses (N - 3,874), 32 social sciences courses (N - 2,639), and 44 
natural sciences courses (H - 3,342). (Two humanities courses with no 
variation in grades, four "other" courses, and the pooled course that combined 
individual courses across departments were excluded. ) The estimated thetas 
for the three fields were then standardized, using the data for the 1,651 
students with all three estimated thetas. An analog to GPA was computed, 
weighting each standardized estimated theta by the number of credit hours in 
the same course category. (The composite measure used by Young, 1990b, 1991, 
was computed differently: it is the weighted average of the unstandardized 
estimated thetas, each estimated theta weighted by the square root of the 
reciprocal of its standard error of estimate.) 

Imputed GPA , Maximum likelihood estimates of the CPAs in each of 53 
individual departments and a pooled department that combined departments with 
fewer than ten CPAs were obtained, with the BMDP AM Program (Frane, 1990) 
using available CPAs for the 54 departments. (Using department CPAs instead 
of course grades facilitated estimation by reducing the size and sparseness of 
the student -by -grade data matrix.) An unweighted overall GPA was computed. 
(GPA was unweighted because of the unavailability of the number of credit 
hours per department.) 

Original regression GPA . This GPA is based on the original Strieker et 
al. (1991) procedure, applied to 140 individual and pooled courses. This is 
the same variable used in the Strieker et al. (1991) study. 

Modified regression GPA . The original Strieker et al. (1991) procedure 
was modified by changing some of the variables that describe the students in 
each course. Three variables were employed: Percentage with High School 
Honors Courses in Any Field, ^ Percentage of College-Bound Seniors in their 
High School, and Percentage with Data on High School Honors Courses. The 
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source of the data on honors courses was the Student Descriptive 
Questionnaire, completed by students when they registered for the SAT, a\id 
recorded in the university's Longitudinal Data Base (LDB) . The source of the 
data on college -bound seniors was the Attending Institution Profile Survey of 
high school officials conducted by Educational Testing Service in 1988; the 
student's high schools were recorded in the LDB. Percentage with Data on High 
School Honors Courses was included to adjust for the effect of missing data on 
Percentage with High School Honors Courses in Any Field by capitalizing on the 
information inherent in the presence or absence of data for the latter 
variable (J. Cohen & P. Cohen, 1983). (Data on this variable was missing for 
25.0% of the sample; data on college-bound seniors was missing for only 1.4%.) 
The same 140 individual and pooled courses used with the original Strieker et 
al. (1991) procedure were employed. 

Grade > residual CPA . The Ramist et al. (1990) method was followed with 
the individual and pooled courses used in the original Strieker et al. (1991) 
procedure. (HSR was substituted for the unavailable high school CPA used by 
Ramist et al., 1990.) The difference between mean course grade and mean 
predicted overall CPA was then applied to the grades of each student in the 
course, including those without predicted CPAs. 

Predicted CPA . The predicted CPA proposed by Ramist et al. (1990) was 
calculated, following their method for predicting individual course grades and 
using the 140 courses in the original Strieker et al. (1991) procedure, (HSR 
was substituted for the high school CPA used by Ramist et al. , 1990.) For 
each course, a regression equation of SAT-V, SAT-M, and HSR against course 
grade was calculated (deleting predictors with negative correlations with 
course grade or negative regression coefficients), and predicted course grades 
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were obtained with the equation, A GPA was computed from these predicted 
grades . 

Other Variables 

Sex. SAT scores, and HSR were obtained from the LDB; HSR was converted to 
normalized T scores. The original source of these variables, also used in the 
Strieker et al. (1991) study, was official records. 
Analyses 

Analyses were conducted for the actual and adjusted CPAs, and, for two 
kinds of residualized actual and adjusted CPAs that represented over- 
underprediction. One kind of residualized GPA used predictions from SAT-V and 
SAT-M; the other kind used predictions from SAT-V. SAT-M. and HSR. The 
predictions were made with regression equations for students in the cohort 
with complete data on the particular set of predictors (SAT scores or SAT 

scores and HSR) and GPA. (The same analyses of over-underprediction were 

conducted in the Strieker et al., 1991. study.) 

Similar analyses were done for the predicted GPA measure and an analogous 

measure of over-underprediction: actual GPA residualized for predicted GPA. 

Because predicted GPA is derived from SAT scores and HSR, the corresponding 

residualized GPA measure is included only in analyses of CPAs residualized for 

both kinds of predictors. 

Product-moment intercorrelation matrices were computed, using missing 

data procedures (each correlation was based on all available students), and 

multiple correlations were calculated from these matrices. 

Because of the large sample size, both statistical and practical 

significance were considered in assessing the results. The .01 level was used 

throughout in view of the sample size. (The total H for significance tests of 

multiple correlations was the smallest N for any of the zero-order 
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correlations involved.) A minimum effect size (J. Cohen, 1988) was used that 
accounted for 1% of the variance (e.g., a correlation of .10, a difference in 
means [i] of .20 of a standard deviation.) This size is commonly considered 
to be a "small" effect from the -standpoint of practical significance 
(J. Cohen, 1988). 

The internal-consistency reliabilities of the actual and adjusted CPAs 
were estimated. For the actual CPA, within-subject CPA, original regression 
CPA, modified regression CPA, and grade-residual CPA, CPAs for "odd" and 
"even" halves of the course grades were obtained, and reliability was 
estimated by the Spearman- Brown formula from the correlations between the two 
CPAs. For the IRT CPA, CPAs for each of three fields were obtained, weighting 
each standarized theta by the corresponding nvunber of credit hours; 
reliability was estimated by the Spearman-Brown formula from the mean 
intercorrelation between the three CPAs. And for the imputed CPA, reliability 
was estimated by Coefficient Alpha. 



Intercorrelations of Actual and Adjusted CPAs 

The intercorrelations and internal -consistency reliabilities for the 



CPA, correlated highly with each other (.91 to .99), but the IRT GPA 
consistently correlated lower than the others (.91 to .94). 

The CPAs' reliabilities varied considerably (.64 to .99). The imputed 
GPA (.99), grade residual GPA (.77), original regression GPA (.76), and 
modified regression GPA (.76) had higher reliabilities than actual GPA (.70), 
and the IRT GPA (.64) had a lower reliability; the reliability of the within- 
subjects GPA (.71) was similar to that of actual GPA. 



Results 



actual and adjusted CPAs appear in Table 1. All the CPAs, including actual 
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Insert Table 1 about here 



Correlations of SAT Scores and HSR with Actual and Adjusted CPAs 

The zero -order and multiple correlations of SAT scores and HSR with 
actual and adjusted CPAs appear in Table 2. The SAT scores and HSR generally 
correlated higher with the adjusted CPAs than with the actual GPA. The 
original Strieker et al. GPA was an exception: the correlations of SAT scores 
and HSR with it were close to those with actual GPA. The correlations with 
the other adjusted GPAs were generally similar. 

The predicted GPA correlated .56 (£ < .01) with actual GPA, somewhat 
larger than the corresponding multiple correlation of .42 for SAT scores and 
HSR with this criterion. Note that the former correlation is inflated because 
actual grades in individual courses are used as criteria in the process of 
obtaining predicted grades for these courses, and these same actual course 
grades, in turn, are the components of actual GPA. 



Insert Table 2 about here 



Intercorrelations of GPA Over-Underprediction Measures 

The intercorrelations of the actual and adjusted GPAs residualized for 
SAT appear in Table 3. Paralleling the intercorrelations of actual and 
adjusted GPAs, all these residualized GPAs correlated highly with each other 
(.90 to .99), but the IRT measure consistently Correlated lower with the 
others ( .90 to .93) . 



Insert Table 3 about here 
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The corresponding intercorrelations of the actual and adjusted CPAs 
resldualized for SAT and HSR are shown in Table 4. The predicted GPA 
residualized measure also appears in this table. All the residualized CPAs, 
including the predicted GPA measure, correlated highly (.88 to .99). But the 
IRT measure (.88 to .92) and the predicted GPA measure (.88 to .96) correlated 
lower than the others. 



Insert Table 4 about here 



Sex Differences in GPA Over-Underprediction Measures 

The mean actual and adjusted GPAs residualized for SAT scores are 
reported in Table 5 for women and men; the corresponding statistics for the 
GPAs residualized for SAT scores and HSR appear in Table 6, together with the 
statistics for the predicted GPA measure. For comparison, the mean actual and 
adjusted GPAs for both sexes appear in Table 7. Note that the d indexes for 
differences between the means for women and men and for the statistical 
significance of these differences are inflated for imputed GPA and the imputed 
GPA residualized measures because the variability of imputed GPA is attenuated 
by the imputation process (Little and Rubin, 1987); the actual differences 
between the means for these variables are unaffected. In addition, the actual 
differences between the means for IRT GPA and the IRT GPA residualized 
measures are not comparable to those for actual GPA and other . adjusted GPAs 
because IRT GPA is not on the same 1-4 grade scale; the d indexes for these 
IRT GPA variables are comparable. And the differences between the means for 
the predicted GPA residualized measure are underestimated for the reasons 
mentioned previously. 
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All the sex differences for actual and adjusted CPAs (Table 7) were small 
(actual differences of .01 to .09; ds of .00 to .11), and most were not 
significant (2 > .01), with the exception of actual GPA and the original 
regression GPA. All the adjusted CPAs, except the original regression GPA, 
had smaller sex differences than actual GPA. 

All the sex differences for actual and adjusted GPAs residualized for SAT 
(Table 5) were small (actual differences of -.09 to -.21; ds of -.21 to -.26) 
but statistically significant "(£ < .01). The sex differences were generally 
similar for the actual and adjusted GPA measures, but were somewhat smaller 
for the imputed GPA measure. 

All the corresponding differences for actual and adjusted GPAs 
residualized for SAT and HSR (Table 6) were substantially smaller than those 
for GPAs residualized for SAT. The sex differences were small (actual 
differences of -.04 to -.11; ds of -.05 to -.15) and, except for the predicted 
GPA measure, were statistically significant (2 < .01). All the adjusted GPA 
measures, except the original regression measure, displayed substantially 
smaller sex differences than the actual GPA measure. The other adjusted GPA 
measures generally had similar sex differences, but the differences for the 
imputed GPA and predicted GPA measures were somewhat smaller. 



Insert Tables 5 to 7 about here 



Discussion 
Psychometric Properties of Adjusted GPAs 

A central finding is that most of the methods for adjusting GPA, with the 
exception of the original regression procedure, functioned similarly and, 
despite high correlations with actual GPA, operated differently from it. The 
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adjusted CPAs generally appeared to be psychometrically superior in 
reliability and, on the basis of their predictability from SAT and HSR, in 
validity. The evidence on the latter point is only suggestive and needs to be 
confirmed by further investigations of the -comparative validity of adjusted 
and actual GPA, for it is at least conceivable that the enhanced 
predictability of the adjusted CPAs could come about for reasons extraneous to 
academic success (e.g., the common effects o£ test anxiety on both the 
admissions measures and the' adjusted grades) that are unintentionally 
magnified by the adjustment process. 

It should be recognized that the adjustment methods are not free from 
problems. Most of the methods, with the exception of the original regression 
and imputed procedures, directly or indirectly emphasize general ability, and 
hence may make inadequate adjustments for grades in courses that demand 
special abilities or interests, such as courses in the arts, or involve 
unusually superior or inferior instruction (Strenta & Elliott, 1987). Insofar 
as there are many such courses, the validity of the adjusted GPA will be 
affected. 

In addition, though actual GPA is far from perfect from a psychometric 
perspective, its flaws should not be overstated (Etaugh, Etaugh, & Hurd, 
1972). Actual GPA's reliability is substantial, its predictability is 
appreciable, and it is f actor ially simple. (Schoenfeldt and Brush, 1975, 
found a large general factor and a smaller agriculture and education factor in 
an analysis of cumulative GPAs over 13 quarters in 12 fields; Young, 1990a, 
1990b, identified two group factors, natural sciences and engineering, and 
social sciences and humanities, in an analysis of freshmen grades in 127 
courses . 
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All the methods, with the possible exception of the original regression 
'procedure, appear promising, given the limited data currently available about 
them. Of these methods, the grade residual procedure is the most desirable 
from the standpoint of computational simplicity. 

The limited effectiveness of both the original regression method and the 
new regression method in adjusting grades probably stems from their use of 
variables that are only indirectly and weakly related to college grades. (The 
multiple correlations of the variables with the mean course grades were .47 
for the original regression method and .31 for the new regression method.) 
All the other methods, in contrast, relied on either college grades or SAT 
scores and HSR, much more potent variables. 

Incidentally, the similarity in functioning of the original regression 
method, which used major- specific variables, and the new regression method, 
which did not, indicates that undercorrecting grades for department 
differences was not the explanation for the original regression method's 
minimal success in adjusting grades. 

The present results are generally similar to those obtained in previous 
studies. The reliability for actual GPA is in the same range as the internal- 
consistency estimates for freshman GPA observed earlier (Barritt, 1966; Clark, 
1964; Etaugh et al., 1972; Millman, Slovacek, Kulick, & Mitchell, 1983; Ramist 
et al., 1990; Singleton & Smith, 1978). But the failure of the within- 
subjects method to enhance reliability appears inconsistent with the higher 
correlations, observed by Elliott and Strenta (1988), between academic-year 
CPAs based on this method. 

The high correlation between the IRT GPA and actual GPA resembles the 
equally high correlations, reported by Young (1991), between GPA adjusted by 
this method and actual GPA (both GPAs for four academic years). 
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The higher correlations of SAT-V and SAT-M with the within- subjects GPA 
than with the actual GPA are consistent with the higher correlations of total 
SAT scores with the four-year GPA adjusted by this method that Elliott and 
Strenta (1988) reported. And the higher correlations of SAT scores and HSR 
with the IRT GPA than with actual GPA are comparable to the higher 
correlations of SAT scores and high school GPA with four-year GPA adjusted by 
this method, observed by Young (1990b; 1991). 

Future efforts at adjusting grades might benefit from combining features 
of the Internal and external methods. One obvious approach Is to obtain 
adjusted grades with an internal method (within- subj ects , IRT, or Imputed), 
and then modify these grades with an external method (original regression, 
modified regression, or grade residual). An alternative Is to Incorporate 
axixlllary Information about the courses or the students In making IRT 
estimates of course performance (Embretson [Whltely] , 1984; Mlslevy, 1987) or 
In imputing course grades. This research might also profit from using a broad 
range of course variables (including characteristics of their instructors and 
the students enrolled in them) to adjust grades (Frisbee, 1984; Prather and 
Smith, 1976). 

Sex Differences in GPA Over-Underpredlctlon 

Another important outcome was that the various adjusted GPAs generally 
functioned similarly with regard to sex differences in over-underpredictlon. 
(The original regression method was an exception.) The methods reduced or 
even eliminated differences in GPA. However, they failed to narrow over- 
underpredictlon by SAT scores, though they did cut over-underpredictlon by SAT 
scores and HSR. The greater effectiveness of the grade adjustments in 
reducing over-underpredictlon by SAT scores and HSR is intriguing but cannot 
be explained at this point. It is evident, though, that the grade adjustments 
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and HSR operate independently to reduce the over-underprediction and reflect 
different processes at work in students' course selection and grade getting. 

The considerably smaller amount of over-underprediction when HSR was 
added to SAT scores reflects the incremental- validity - of HSR in predicting 
grades. Given the limited amounc of over-underprediction in this situation, 
the sizable reduction produced by the adjustments is remarkable. 

The small but significant sex differences in over-underprediction 
associated with all the adjusted CPAs clarify inconsistencies in the findings 
between the Strieker et al. (1991) study and other investigations. In the 
Strieker et al. investigation, modest but statistically significant sex 
differences in over-underprediction were found with the original regression 
procedure, in contrast with insignificant sex differences in over- 
underprediction observed with the within- subjects and the IRT methods in 
studies at other universities (Elliott & Strenta. 1988; Young, 1991). The 
uniformly significant sex differences in over-underprediction in the present 
study, regardless of the adjustment method used, reinforce the Strieker et al. 
contention that the discrepant outcomes in their investigation and in the 
previous studies are not attributable to the various methods employed, but 
probably reflect institutional differences. 
Predicted GPA Measure 

The functioning of the predicted GPA measure was striking. It had 
appreciably greater effectiveness than the SAT scores and HSR in predicting 
actual GPA, though it was based on these measures, and exhibited no 
significant sex differences in over-underprediction in contrast to the 
significant, though small, differences in over-underprediction displayed by 
the SAT and HSR. 
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These estimates of the effectiveness of the predicted GPA measure are 
inflated by capitalization on chance, but they still indicate this measure's 
potential. The measure obviously needs to be cross validated to obtain 
precise estimates of its effectiveness. The present equations for course 
grade predictions could be applied to the same courses. in a subsequent year; 
alternatively, new prediction equations could be obtained, using half of the 
study cohort, and applied to the other half. 

From the standpoint of improving the prediction of academic performance, 
it remains to be seen whether this method will prove to be as effective in 
dealing with differences in grading standards by modifying the predictor as 
grade adjustment methods are by modifying the criterion. 
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Footnote 

^More precisely, this is the weighted percentage of those taking (or 
planning to take) courses in six disciplines (Arts and Music, English, Foreign 
and Classical Languages, Mathematics, Natural Sciences, Social Sciences and 
History) who were (or planned to be) in honors, advanced placement, or 
accelerated courses. 
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Table 2 

Correlations of SAT Scores and HSR wit h Actual and Adjusted CPAs 



SAT Scores and HSR 



SAT-V, 



GPA 


SAT-V 


SAT-M 


HSR 


SAT-V, 
SAT-M' 


SAT-M, 
HSR'' 


Actual 


.33 


.30 


.34 


.36 


.42 


Within subjects 


.33 


.38 


.41 


.42 


.50 


IRT 


.32 


.36 


.41 


.40 


.49 


Imputed 


.32 


.35 


.40 


.39 


.47 


Original regression 


.32 


.31 


.36 


.37 


.44 


Modified regression 


.31 


.35 


.37 


.38 


.45 


Grade residual 


.35 


.40 


.41 


.43 


.51 


Note. Ns vary from 4,267 


to 4,268 for SAT- 


■V and SAT- 


■M zero- 


order correlations 


and for SAT-V and SAT-M multiple correlations, from 3,989 to 


3,990 for 


HSR 


zero-order correlations. 


and from 3,965 to 


3,966 for 


SAT- V, 


SAT-M, and 


HSR 



multiple correlations. All correlations are significant at the .01 level 

(two-tail for zero-order correlations). 

•This is the multiple correlation for SAT-V and SAT-M. 

^This is the multiple correlation for SAT-V, SAT-M, and HSR. 
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